Pnomolos/faster #2

pnomolos · 2017-06-04T04:55:49Z

@joelparkerhenderson After running way too many benchmarks and trying to come up with far too many possible algorithms I believe I have a winner - results here.

I tested the various algorithms (using benchmark-ipsa) against several sources (wikipedia articles and project gutenberg books). Not included in the benchmarks are also some tests against single characters, and against the entire set of characters. I realized we should probably add tests for various encodings as well, I'll try and add those in soonish.

The libunac implementation also blows everything else away (as I expected) but it needs some cleanup and a few more checks.

As well, per my comment (#1 (comment)) should I change the output of those two characters?

Algorithm three was a very interesting one as you'll notice from the results.

  def unaccent_three
    @unaccent_three_regex ||= Regexp.union(ACCENTMAP.keys)
    gsub(@unaccent_three_regex, ACCENTMAP)
  end

very low memory usage (as I believe it would be a single allocation in C) but speed results were all over the place. I'm curious if there's some sort of cutoff in terms of string length where we could alternate between the two, but I haven't had a chance to investigate that far.

…n up Rakefile, etc.

pnomolos · 2017-06-04T05:13:51Z

@joelparkerhenderson Testing in all encodings is turning out to be more of a pain than I expected. What are your thoughts on this? Should unaccent only work on UTF-8 (or UTF-16, I'd imagine) strings, or should it operate on as many encodings as possible?

pnomolos · 2017-06-05T17:17:30Z

@joelparkerhenderson I've pulled in your latest changes and updated the benchmark.

redtachyons · 2017-08-05T07:48:05Z

Is it possible to get this PR merged ASAP, addressable gem using this as a dependency and this takes lot of memory. I found this pr when when I was about to raise one with some optimizations which are already done in this PR
Thanks in advance

Ref redtachyons@2da4a4b

joelparkerhenderson · 2017-08-10T13:06:23Z

Thanks! I'll go through these today & tomorrow. First glance it all looks fine.

joelparkerhenderson · 2017-08-10T13:30:22Z

@pnomolos You're asking great questions. I propose that a reliable way for now is to do just UTF8, and also append that to the gem name i.e. name the gem sixarm_ruby_unaccent_utf8_libunac. I've talked with a couple people who prefer the greater specificity. I'll let you make the call; what would you like?

redtachyons · 2017-08-10T14:09:34Z

BTW it is better to gitignore coverage files

joelparkerhenderson · 2017-08-10T14:18:31Z

@redtachyons Thanks, I'll ignore /coverage/ and /coverage.data, in an independent commit. I'll also change from listing the dependencies in Gemfile to using the "gemspec" line as above, in an independent commit. And also ignore the Gemfile.lock file.

pnomolos · 2022-07-05T17:18:28Z

@joelparkerhenderson We're still using this gem internally - if I rework this commit would you consider pulling it in and cutting a new version of the gem?

joelparkerhenderson · 2022-07-05T17:57:04Z

Yes. What version of Ruby are you using? What I suggest, if you want, is I can get the gem up to date with Ruby 3. I have time this weekend. Then could you add your improvements on top of that? (Or do you prefer a different way?)

pnomolos · 2022-07-05T22:47:43Z

Hi Joel,

If you have time to do that it would be great, as we're hoping to move to Ruby 3 sometime later this year.

Thanks!

pnomolos · 2022-07-22T23:22:40Z

Hi @joelparkerhenderson if you don't have bandwidth to get to this I'd be fine doing two PRs - one for Ruby 3 support and an updated one for this. Would that be OK with you?

Thanks,

Phil

pnomolos added 3 commits June 3, 2017 21:44

Faster and less allocations algorithm

7ce8123

Add test to check every character

fc08234

Updated Gemfile to use gemspec, added ruby 2.3.0 to .travis.yml, clea…

617be33

…n up Rakefile, etc.

pnomolos added 3 commits June 5, 2017 09:26

Merge remote-tracking branch 'upstream/master' into pnomolos/faster

d3e3ed1

Fix post-merge Gemfile.lock issue

295f10f

Updated coverage

eaa2ab1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pnomolos/faster #2

Pnomolos/faster #2

pnomolos commented Jun 4, 2017 •

edited

Loading

pnomolos commented Jun 4, 2017

pnomolos commented Jun 5, 2017

redtachyons commented Aug 5, 2017 •

edited

Loading

joelparkerhenderson commented Aug 10, 2017

joelparkerhenderson commented Aug 10, 2017

redtachyons commented Aug 10, 2017

joelparkerhenderson commented Aug 10, 2017

pnomolos commented Jul 5, 2022

joelparkerhenderson commented Jul 5, 2022

pnomolos commented Jul 5, 2022

pnomolos commented Jul 22, 2022

Pnomolos/faster #2

Are you sure you want to change the base?

Pnomolos/faster #2

Conversation

pnomolos commented Jun 4, 2017 • edited Loading

pnomolos commented Jun 4, 2017

pnomolos commented Jun 5, 2017

redtachyons commented Aug 5, 2017 • edited Loading

joelparkerhenderson commented Aug 10, 2017

joelparkerhenderson commented Aug 10, 2017

redtachyons commented Aug 10, 2017

joelparkerhenderson commented Aug 10, 2017

pnomolos commented Jul 5, 2022

joelparkerhenderson commented Jul 5, 2022

pnomolos commented Jul 5, 2022

pnomolos commented Jul 22, 2022

pnomolos commented Jun 4, 2017 •

edited

Loading

redtachyons commented Aug 5, 2017 •

edited

Loading